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Parametric Regression on the Grassmannian 

Yi Hong, Nikhil Singh, Roland Kwitt, Nuno Vasconcelos and Marc Niethammer 


Abstract —We address the problem of fitting parametric curves on the Grassmann manifold for the purpose of intrinsic parametric 
regression. As customary in the literature, we start from the energy minimization formulation of linear least-squares in Euclidean spaces 
and generalize this concept to general nonflat Riemannian manifolds, following an optimal-control point of view. We then specialize this 
idea to the Grassmann manifold and demonstrate that it yields a simple, extensible and easy-to-implement solution to the parametric 
regression problem. In fact, it allows us to extend the basic geodesic model to (1) a “time-warped” variant and (2) cubic splines. We 
demonstrate the utility of the proposed solution on different vision problems, such as shape regression as a function of age, traffic- 
speed estimation and crowd-counting from surveillance video clips. Most notably, these problems can be conveniently solved within the 
same framework without any specifically-tailored steps along the processing pipeline. 

Index Terms —Parametric regression, Grassmann manifold, geodesic shooting, time-warping, cubic splines 
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1 Introduction 

ANY data objects in computer vision problems 
admit a subspace representation. Examples in¬ 
clude feature sets obtained after dimensionality reduc¬ 
tion via principal component analysis (PCA), observabil¬ 
ity matrix representations of linear dynamical systems, 
or landmark-based representations of shapes. Assuming 
equal dimensionality (e.g., the same number of land¬ 
marks), data objects can be interpreted as points on the 
Grassmannian G(p, n), i.e., the manifold of p-dimensional 
linear subspaces of R". The seminal work of [1] and 
the introduction of efficient processing algorithms to 
manipulate points on the Grassmannian [2] has led to a 
variety of principled approaches to solve different vision 
and learning problems. These include domain adapta¬ 
tion [3], [4], gesture recognition [5], face recognition 
under illumination changes [6], or the classification of 
visual dynamic processes [7]. Other works have explored 
subspace estimation via conjugate gradient descent [8], 
mean shift clustering [9], or the definition of suitable 
kernel functions [10], [11], [12] that can be used with 
a variety of kernel-based machine learning techniques. 

Since, most of the time, the primary objective is 
to perform classification or recognition tasks on the 
Grassmannian, the problem of intrinsic regression in a 
parametric setting has gained little attention. However, 
modeling the relationship between manifold-valued data 
and associated descriptive variables has the potential 
to address many problems in a principled way. For 
instance, it enables predictions of the descriptive vari¬ 
able while respecting the geometry of the underlying 
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space. Further, in scenarios such as shape regression 
— a common problem in computational anatomy — 
we are specifically interested in summarizing continu¬ 
ous trajectories that capture variations in the manifold¬ 
valued variable as a function of the scalar independent 
variable. Fig. 1 illustrates these two inference objectives. 
While predictions of the scalar-valued variable could, 
in principle, be formulated within existing frameworks 
such as Gaussian processes or support vector regression, 
e.g., by using Grassmann kernels [10], [11], it is unclear 
how to or if it is possible to address the second inference 
objective in such a formulation. 

In this work, we propose an approach to intrinsic 
regression that allows us to directly fit parametric curves 
to a collection of data points on the Grassmann man¬ 
ifold, indexed by a scalar-valued variable. Preliminary 
versions of this manuscript [13], [14] essentially fo¬ 
cused on fitting geodesics and how to re-parametrize 
the independent variable to increase flexibility. Here, 
we first recapitulate the optimal-control perspective of 
curve fitting in Euclidean space as an example and then 
discuss extensions of linear and cubic spline regression 
on the Grassmannian. The proposed models are simple 
and natural extensions of classic regression models in 
Euclidean space. They provide a compact representation 
of the complete curve, as opposed to discrete curve 
fitting approaches for instance which typically return 
a sampling of the sought-for curves. In addition, the 
parametric form of the curves, e.g., given by initial con¬ 
ditions, allows to freely move along them and synthesize 
additional observations. Finally, parametric regression 
opens up the possibility of statistical analysis of curves 
on the manifold, which is essential for comparative 
studies in medical imaging for instance. 

We demonstrate the versatility of the approach on 
two types of vision problems where data objects admit 
a representation on the Grassmannian. First, we model 
the aging trends in human brain structures and the 
rat calvarium under an affine-invariant representation 
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Fig. 1 : Illustration of parametric regression and inference. At the point marked ®, the inference objective for (i) traffic videos is to 
predict the independent variable r* (e.g., speed), whereas for (ii) corpus callosum shapes we seek the manifold-valued T* at an 
independent variable (e.g., age). Here, elements on the Grassmannian are visualized as lines through the origin, i.e., e <5(1,2). 


of shape [15]. Second, we use our models to predict 
traffic speed and crowd counts from dynamical system 
representations of surveillance video clips ivithont any 
specifically tailored preprocessing. All these problems 
are solved within the same framework with minor pa¬ 
rameter adjustments. 

The paper is organized as follows: Section 2 discusses 
related work about regression on nonflat Riemannian 
manifolds. Section 3 recapitulates the problems of linear, 
time-warped and cubic spline regression in Euclidean 
space from an optimal-control point of view. These ideas 
are then extended to Riemannian manifolds (Section 4) 
and specialized to the Grassmannian (Section 5). Ex¬ 
periments on toy examples and real applications are 
presented in Section 6 and 7, respectively Section 8 
concludes the paper with a review of the main points 
and a discussion of open problems. 

2 Previous work 

At the coarsest level, we distinguish between two cat¬ 
egories of regression approaches: parametric and non- 
parametric strategies, with all the known trade-offs on 
both sides [ 6]. In fact, non-parametric regression on 
nonflat manifolds has gained considerable attention over 
the last years. Strategies range from kernel regression 
[17] on the manifold of diffeomorphic transformations 
to gradient-descent [ 8] approaches on manifolds com¬ 
monly encountered in computer vision [19], such as the 
group of rotations SO( 3) or Kendall's shape space. In 
other works, discretizations of the curve fitting problem 
have been explored [20], [21] which, in some cases, even 
allow to employ second-order optimization methods 
[22]. Because our work is a representative of the paramet¬ 
ric category, we mostly focus on parametric approaches 
in the following review. 

While differential geometric concepts, such as 
geodesics and intrinsic higher-order curves, have 
been well studied [23], [24], their use for parametric 


regression, i.e., finding parametric relationships between 
the manifold-valued variable and an independent 
scalar-valued variable, has only recently gained interest. 
A variety of methods extending concepts of regression 
in Euclidean spaces to nonflat manifolds have been 
proposed. Rentmeesters [25], Fletcher [26] and Hinkle 
et al. [27] address the problem of geodesic fitting 
on Riemannian manifolds, primarily focusing on 
symmetric spaces, to which the Grassmannian belongs. 
Niethammer et al. [28] generalized linear regression 
to the manifold of diffeomorphisms to model image 
time-series data, followed by works extending this 
concept [29], [30] and enabling the use of higher-order 
models [31]. 

From a conceptual point of view, we can identify 
two groups of solution strategies to solve parametric 
regression problems on nonflat manifolds: first, geodesic 
shooting based strategies which address the problem 
using adjoint methods from an optimal-control point of 
view [28], [29], [30], [31]; the second group comprises 
strategies which are based on optimization techniques 
that leverage Jacobi fields to compute the required gradi¬ 
ents [25], [26]. Unlike Jacobi field approaches, solutions 
using adjoint methods do not require computation of 
the curvature explicitly and easily extend to higher- 
order models, e.g., polynomials [27] or splines [31]. Our 
approach is a representative of the adjoint approach, 
thereby ensuring extensibility to more advanced models, 
such as the proposed cubic splines extension. 

In the context of computer-vision problems, Lui [5] 
recently adapted the known Euclidean least-squares so¬ 
lution to the Grassmann manifold. While this strategy 
works remarkably well for the presented gesture recog¬ 
nition tasks, the formulation does not guarantee the 
minimization of the sum-of-squared geodesic distances 
within the manifold, which would be the natural exten¬ 
sion of least-squares to Riemannian manifolds according 
to the regression literature. Hence, the geometric and 
variational interpretation of [5] remains unclear. In con- 
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Fig. 2 : Illustration of time-warped regression in M. The dashed 
straight-line (middle) shows the fitting result in the warped time 
coordinates, and the solid curve (right) demonstrates the fitting 
result to the original data points (left). 


trast, we address the problem from the aforementioned 
energy-minimization point of view which allows us to 
guarantee, by design, the consistency with the geometry 
of the manifold. 

To the best of our knowledge, the closest works to ours 
are [32], [25] and, to some extent, [27]. Batzies et ol. [32] 
discuss only a theoretical characterization of the geodesic 
fitting problem on the Grassmannian, but do not provide 
a numerical strategy for estimation. In contrast, we 
derive alternative optimality conditions using principles 
from optimal-control. These optimality conditions not 
only form the basis for our shooting approach, but also 
naturally lead to a convenient iterative algorithm. By 
construction, the obtained solution is guaranteed to be 
a geodesic. As discussed above, Rentmeesters [ 15] fol¬ 
lows the Jacobi field approach. While both optimization 
methods have the same computational complexity for 
the gradient, i.e., Ojnp 2 ) on the Grassmannian Q(p, n), it 
is not trivial to generalize [25] to higher-order models. 
Hinkle et al. [27] address the problem of fitting polyno¬ 
mials, but mostly focus on manifolds with a Lie group 
structure 1 . In that case, adjoint optimization is greatly 
simplified. However, in general, curvature computations 
are required which can be tedious. Our approach, on the 
other hand, offers an alternative, simple solution that 
is (i) extensible, (ii) easy to implement and (iii) does 
not require specific knowledge of differential geometric 
concepts such as curvature or Jacobi fields. 

3 Regression in via Optimal-Control 

We begin our discussion with a review of linear regres¬ 
sion in Euclidean space (R n ) and discuss its solution via 
optimal-control. While regression is a well studied statis¬ 
tical technique and several solutions exist for univariate 
and multivariate models, we will see that the presented 
optimal-control perspective not only allows to easily 
generalize regression to manifolds but also to define 
more complex parametric models on these manifolds. 

3.1 Linear regression 

A straight line in W" can be defined as an acceleration- 
free curve with parameter t, represented by states, 
(aq (t),x 2 (t)), such that aq = x 2 , and x 2 = 0, where 
aq(t) € R" is the position of a particle at time t and 

1. G(p,n) does not possess such a group structure. 


x 2 (t) G R n represents its velocity at t. Let € R" 

denote a collection of N measurements at time instances 
t 0 , , fjv-i with t,j € [0,1]. We define the linear regres¬ 

sion problem as that of estimating a parametrized linear 
motion of the particle aq(t), such that the path of its 
trajectory best fits the measurements in the least-squares 
sense. The unconstrained optimization problem, from an 
optimal-control perspective, is 


min 

© 


JV-l 


£(©)= E - Vi\\ 2 + 

i =0 

/ Xj (aq - x 2 ) + Aj( x 2 ) dt , 

Jo 


( 1 ) 


with © = {xi(0)}|_ 1 , i.e., the initial conditions, and 
Ai, A 2 € R n are time-dependent Lagrangian multipliers. 
For readability, we have omitted the argument t for 
Ai(f) and X 2 (t). These variables are also referred to 
as adjoint variables, enforcing the dynamical "straight- 
line" constraints. Evaluating the gradients with respect 
to the state variables results in the adjoint system as 
Ai = 0, and — A 2 = Ai, with jumps in Ai as Ai {tjj) — 
Ai(t~) = 2(aq(L) — yi), at measurements f,. The op¬ 
timality conditions on the gradients also result in the 
boundary conditions Ai (1) = 0 and A 2 (1) = 0. Finally, 
the gradients with respect to the initial conditions are 


V Xl (o)-E — — Ai(0), and V a , 2 (o)£'— — A 2 ( 0 ) . (2) 

These gradients are evaluated by integrating backward 
the adjoint system to t = 0 starting from t = 1. 

This optimal-control perspective constitutes a general 
method for estimating first-order curves which allows 
to generalize the notion of straight lines to manifolds 
(geodesics), as long as the forward system (dynamics), 
the gradient computations, as well as the gradient steps 
all respect the geometry of the underlying space. 


3.2 Time-warped regression 

Fitting straight lines is too restrictive for some data. 
Hence, the idea of time-warped regression is to use a 
simple model to warp the time-points, or more generally 
the independent variable, when comparison to data is 
performed, e.g., as in the data matching term of Eq. (1). 
The time-warp should maintain the order of the data, and 
hence needs to be diffeomorphic. This is conceptually 
similar to an error-in-variables model where uncertainties 
in the independent variables are modeled. However, 
in the concept of time-warping, we are not directly 
concerned with modeling such uncertainties, but instead 
in obtaining a somewhat richer model based on a known 
and easy-to-estimate linear regression model. 

In principle, the mapping of the time points could be 
described by a general diffeomorphism. In fact, such an 
approach is followed in [33] for spatio-temporal atlas¬ 
building in the context of shape analysis. Our motivation 
for proposing an approach to linear regression with 
parametric time-warps is to keep the model simple while 
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Fig. 3: Cubic spline regression in R. The leftmost side shows the regression result, and the remaining plots show the other states. 


gaining more flexibility. Extensions to non-parametric 
approaches can easily be obtained. A representative of a 
simple parametric regression model is logistic regression 2 
which is typically used to model saturation effects. Un¬ 
der this model, points that are close in time for the linear 
fit may be mapped to points far apart in time, thereby 
allowing to model saturations for instance ( cf. Fig. 2). 
Other possibilities of parametric time-warps include 
those derived from families of quadratic, logarithmic and 
exponential functions. 

Formally, let / : R — > R, t H>■ t = f(t ; 9) denote a 
parametrized (by 9) time-warping function and let X\ (t) 
denote the particle on the regression line in the warped 
time coordinates t. Following this notation, the states are 
denoted as (xi(t),x 2 (t)) and represent position and slope 
in re-parametrized time t. In time-zvarped regression, the 
data matching term of Eq. (1) then becomes 

JV-l 

£lM/(h;0))-^ll 2 , (3) 

i =0 

and the objective (as before) is to optimize aq (to) and 
X 2 (to) as well as the parameter 9 of f(t; 9). 

A convenient way to minimize the energy functional 
in Eq. (1) with the data matching term of Eq. (3), is to use 
an alternating optimization strategy. That is, we first fix 
9 to update the initial conditions, and then fix the initial 
conditions to update 9. This requires the derivative of 
the energy with respect to 9 for fixed aq (t). Using the 
chain rule, we obtain the gradient VgE as 

N -1 

2'£(x 1 (f(t i -,9))-y i ) T x 1 (f(t l - : 9))X7 0 f(t i -,9 ) . (4) 

i—0 

Given a numerical solution to the regression problem 
of Section 3.1, the time-warped extension alternatingly 
updates (a) the initial conditions (aq(to), x 2 (to)) the 
warped time domain using the gradients in Eq. (2) and 
(b) 9 using the gradient in Eq. (4). Fig. 2 visualizes the 
principle of time-warped linear regression on a collec¬ 
tion of artificially generated data points. While the new 
model only slightly increases the overall complexity, it 
notably increases modeling flexibility by using a curve 
instead of a straight line. 

2. Not to be confused with the statistical classification method. 


3.3 Cubic spline regression 

To further increase the flexibility of a regression model, 
cubic splines are another commonly used technique. In 
this section, we revisit cubic spline regression in R" from 
the optimal-control perspective. This will facilitate the 
transition to general Riemannian manifolds. 

3.3.1 Variational formulation 

An acceleration-controlled curve with time-dependent 
states (xi,x 2 ,x 3 ) such that aq = x 2 and ± 2 = x 3r defines 
a cubic curve in R n . Such a curve is a solution to the 
energy minimization problem, cf. [34], 

mm E(0) = ll I hall 2 *, (5) 

subject to aq = x 2 and x 2 = x 3 , 

with © = {aq(t)} 3 =1 . Here, x 3 is referred to as the control 
variable that describes the acceleration of the dynamics 
in this system. Similar to the strategy for fitting straight 
lines, we can get a relaxation solution to Eq. (5) by 
adding adjoint variables which leads to the system of 
adjoint equations Ai = 0 and x 3 = — Ai. 

3.3.2 From relaxation to shooting 

To obtain the shooting formulation, we explicitly add the 
evolution of x 3 , i.e., x 3 = — Ai, as another dynamical con¬ 
straint; this increases the order of the dynamics. Setting 
Xi = - A-| results in the classical system of equations for 
shooting cubic curves 

Xi = x 2 (f), x 2 = x 3 (t), x 3 =Xi(t), aq = 0 . (6) 

The states (aq, x 2 , x 3 , aq), at all times, are entirely deter¬ 
mined by their initial values {.r,(0) }j =] and, in particular 
we have x\(t) = £i(0) + x 2 (0)t + \x 3 (h)t 2 + gX 4 ( 0 )t 3 . 

3.3.3 Data-independent controls 

Using the shooting equations of Eq. (6) for cubic splines, 
we can define a smooth curve that best fits the data in the 
least-squares sense. Since a cubic polynomial by itself is 
restricted to only fit "cubic-like" data, we add flexibility 
by gluing together piecewise cubic polynomials. Typi¬ 
cally, we define controls at pre-defined locations, and 
only allow the state Xi to jump at those locations. 
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We let {t c }c =1 ,t c £ (0,1) denote C data-independent 
fixed control points, which implicitly define C +1 inter¬ 
vals in [0,1], denoted as . The constrained energy 

minimization problem corresponding to the regression 
task, in this setting, can be written as 


G +1 


mm 

© 


2/d I 2 , 


subject to 


fi( 0 )=EElW*‘)-: 

c—1 i£T c 

±1 = X 2 (t), X2 = X 3 (t ), 

X 3 = X 4 (t), ±4 = 0, 

and x \ . x 2 , x 3 continuous across t c , 


■ within I c 


(7) 


with parameters © = {{a7(0)]d =1 , {x 4 (t c )} C3 =1 }. Using 
time-dependent adjoint states {Ai]d =1 for the dynamics 
constraints, and (time-independent) duals for the 
continuity constraints, we derive the adjoint system of 
equations from the unconstrained Lagrangian as 


Ai — 0, A 2 — —Ai, A 3 — A 2 , A 4 — —A 3 . ( 8 ) 


The gradients with respect to the initial conditions for 
states {a;j(0)}f =1 are 

V Xl (o)U = — Ai(0), V X2 ( 0 )U = —A 2 (0), 

V X3 (o)E = — A 3 (0), y Xi (o)E = -A4(0) . (9) 

The jerks (i.e., rate of acceleration change) at x 4 (t c ) 
are updated using S/ Xi r tc )E = —A 4 (t c ). The values of 
the adjoint variables at 0 are computed by integrating 
backward the adjoint system starting from A x ( 1 ) = 0 for 
i = 1 ,..., 4. Note that Ai, A 2 and A 3 are continuous 
at joints, but Ai jumps at the data-point location as 
per Ai (tf) — Ai(f“) = 2(xi(U) — yi). During backward 
integration, A 4 starts with zero at each interval at f c+1 
and the accumulated value at t c is used for the gradient 
update of x 4 (t c ). 

It is critical to note that, along the time t, such a formu¬ 
lation guarantees that: (a) x 4 (t) is piecewise constant, (b) 
x 3 (t) is piecewise linear, (c) x 2 (t) is piecewise quadratic, 
and (d) x\ (t) is piecewise cubic. Thus, this results in a 
cubic spline curve. Fig. 3 demonstrates this shooting- 
based spline fitting method on scalar-valued data. While 
it is difficult to explain this data with one simple cubic 
curve, it suffices to add one control point to recover the 
underlying trend. The state x 4 experiences a jump at the 
control location that integrates up three-times to give a 
C 2 -continuous evolution for the state x 4 . 


4 Regression on Riemannian Manifolds 

In this section, we adopt the optimal-control perspec¬ 
tive of previous sections and generalize the regression 
problems to nonflat, smooth Riemannian manifolds. In 
the literature this generalization is typically referred 
to as geodesic regression. For a thorough treatment of 
Riemannian manifolds, we refer the reader to [35], [36]. 
We remark that the term geodesic regression here does 
not refer to the model that is fitted but rather to the 
fact that the Euclidean distance in the matching term 


of the energies is replaced by the geodesic distance on 
the manifold. In particular, the measurements {t/tldo 1 ' n 
Euclidean space now become elements {U }^ 1 on some 
Riemannian manifold Ai with Riemannian metric (•, -) p 
at p € Ai 3 . The geodesic distance, induced by this metric, 
will be denoted as d a . For generality, we also replace t, 
with ri, indicating that the independent value does not 
have to be time, but can also represent other entities, e.g., 
counts or speed. 

Our first objective is to estimate a geodesic 7 : R —> 
Ad, represented by initial conditions 7 ( 7 - 0 ) (i.e., initial 
point), and 7 (r 0 ) (i.e., initial velocity at the tangent space 
T)( ro )Wl), while solving 

f 1 1 V—^ 

mm E(&) =a J (7,7> 7 ( r ) dr +^ d l{l{ r i)> Y i) 

---' J=° (10) 

Regularity Data-matching 

subject to Vy 7 = 0 (geodesic equation) , 

with © = { 7 ( 0 ), 7 ( 0 )} and V denoting the Levi-Civita 
connection on Ai. The covariant derivative V -7 of 0 
ensures that the curve is a geodesic. The parameters 
a > 0 and er > 0 balance the regularity and the data- 
matching term. In the Euclidean case, there is typically 
no regularity term because we usually do not have prior 
knowledge about the slope. Similarly, on Riemannian 
manifolds we may penalize the initial velocity by choos¬ 
ing a > 0; but typically, a is also set to 0. The regu¬ 
larity term on the velocity can be further reduced to a 
smoothness penalty at r 0 , i.e., f 0 ( A f,'y)dr = ( 7 ( 7 - 0 ), 7 ( 7 - 0 )), 
because of the energy conservation along the geodesic. 
Also, since the geodesic is represented by the initial con¬ 
ditions ( 7 ( 7 - 0 ), 7 (^ 0 ))/ we can move along the geodesic 
and estimate the point 7 ( 77 ) that corresponds to K,. 

4.1 Optimization via geodesic shooting 

Taking the optimal-control point of view, the second- 
order problem of Eq. (10) can be written as a system 
of first-order, upon the introduction of auxiliary states 

Xi(r) = 7 ( 7 -), and A' 2 (r) = 7 ( 7 -) . (11) 

Here, X- { corresponds to the intercept and X 2 corresponds 
to the slope in classic linear regression. Considering 
the simplified smoothness penalty of the previous sec¬ 
tion, the original constrained minimization problem of 
Eq. (10) reduces to 

nun E(&) = a(X 2 (r 0 ),X 2 (r 0 )} + 
n- 1 

^'52<P g (X 1 (r i ),Y i ) (12) 

i=0 

subject to Vx 2 A 2 = 0 , 

with © = {ATj(ro)}i =1 . Note that X 4 ( 77 ) is the estimated 
point on the geodesic at 7 \, obtained by shooting forward 

3. We omit the subscript p when it is clear from the context. 





with X-[ (r'o) and X^fro). Analogously to the elaborations 
of previous sections, we convert Eq. (12) to an un¬ 
constrained minimization problem via time-dependent 
adjoint variables, then take variations with respect to 
its arguments and eventually get ( 1 ) dynamical systems 
of states and adjoint variables, ( 2 ) boundary conditions 
on the adjoint variables, and (3) gradients with respect 
to initial conditions. By shooting forward / backward 
and updating the initial states via the gradients, we can 
obtain a numerical solution to the problem. 

4.2 Time-warped regression 

The time-warping strategy of Section 3.2 can be also 
adapted to Riemannian manifolds, because it focuses 
on warping the axis of the independent scalar-valued 
variable, not the axis of the dependent manifold-valued 
variable. In other words, the time-warped model is inde¬ 
pendent of the underlying type of space. Formally, given 
a warping function / ( cf. Section 3.2), all instances of the 
form Xiijri) in Eq. (12) are replaced by X. i (f(r i : 9)). While 
the model retains its simplicity, i.e., we still fit geodesic 
curves, the warping function allows for increased mod¬ 
eling flexibility. 

Since we have an existing solution to the problem of 
fitting geodesic curves, the easiest way to minimize the 
resulting energy is by alternating optimization, similar 
to Section 3.2. This requires the derivative of the energy 
with respect to 6 for fixed X(r). While the derivation is 
slightly more involved, application of the chain rule and 
[18, Appendix A] yields 

X e E = 2a(X : 2 (/(r 0 ; 9)), X 2 (/(r„; 0))>V e /(r o ; 9) 

2 N ~ 1 • (13) 

- ~yi U{r«e)) Y i,XiU{n-,9)))'Vef{r i -,9) 

2—0 

where Log Xl ift ri -e))Yi denotes the Riemannian log- 
map, i.e., the initial velocity of the geodesic connecting 
Xi(f(rf,9)) and Y t in unit time and X 1 (/(r i ;0)) is the 
velocity of the regression geodesic at the warped-time 
point. This leaves to choose a good parametric model for 
f(r; 9). As we require the time warp to be diffeomorphic, 
we choose a parametric model which is diffeomorphic 
by construction. One possible choice is the generalized 
logistic function [37], e.g., with asymptotes 0 for r —> — oo 
and 1 for r —> oo, given by 

f^ 0 ) = (1 _|_ p e -k(r-M)y/m ’ 

with 9 = (k,M,(3,m). The parameter k controls the 
growth rate, M is the time of maximum growth if j3 = m, 
/3 and m define the value of / at t = M, and m > 0 
affects the asymptote of maximum growth. By using 
this function, we map the original infinite time interval 
to a warped time-range from 0 to 1. In summary, the 
algorithm using alternating optimization is as follows: 

0) Initialize 9 such that the warped time is evenly 
distributed within ( 0 , 1 ). 


1) Compute {fi = f(r,: 9) J .^ 1 and perform standard 
geodesic regression using the new time-points. 

2) Update 9 by numerical optimization using the gra¬ 
dient given in Eq. (13). 

3) Check convergence. If not converged goto 1). 

4.3 Cubic spline regression 

Similar to Section 3.3, cubic curves on a Riemannian 
manifold A4 can be defined as solutions to the vari¬ 
ational problem of minimizing an accelleration-based 
energy. The notion of acceleration is defined using the co¬ 
variant derivatives on Riemannian manifolds [23], [24]. 
In particular, we define a time-dependent control, i.e., a 
forcing variable X 3 {r), as 

X 3 (r) = V X2i r)X 2 (r) = . 

We can interpret X :t (r) as a control that forces the curve 
X-\ (r) to deviate from being a geodesic [38] (which is 
the case if X;-)(r) = 0). As an end-point problem, a 
Riemannian cubic curve is thus defined by the curve 
X\ (r) such that it minimizes an energy of the form 

E (Xi) = \ j\ 

where the norm || • || is induced by the metric on M at 
X\. In Section 5.5, this concept will be adapted to the 
Grassmannian to enable regression with cubic splines. 

5 Regression on the Grassmannian 

The Grassmannian is a type of Riemannian manifold 
where the geodesic distance, parallel transport, as well 
as the Riemannian log-/exp-map are relatively simple 
to compute [2]. Before specializing our three regression 
models to this manifold, we first discuss its Riemannian 
structure in Section 5.1 (see [39] for a thorough treat¬ 
ment) and review how different types of data can be 
represented on the Grassmannian in Section 5.2. 

5.1 Riemannian structure of the Grassmannian 

The Grassmann manifold Q(jp,n) is defined as the set 
of p-dimensional linear subspaces of R", typically rep¬ 
resented by an orthonormal matrix Y G R" x v , such 
that the column vectors span y, i.e., y = span(Y). 
It can equivalently be defined as a quotient space 
within the special orthogonal group SO{n) as Q(p,n) := 
SO(n)/{SO{n — p) x SO(p)). The canonical metric g y : 
TyG{p, n) x TyG(p , n) -t R on G(p , n) is given by 

gy(Ay, A y ) = tr Aj = tr C T (I„ - YY T )C , (15) 

where I ra denotes the n x n identity matrix, 7yG{p,n) 
is the tangent space at y, C G R nx/ ' is arbitrary, and 
Y is a representer for y. Under this choice of metric, 
the arc-length of the geodesic connecting two subspaces 
y, Z G G(p, n) is related to the canonical angles 4>i,... (f> p G 
[0,7t/2] between y and Z as d^(y,Z) = ||0|||. In what 
follows, we slightly change notation and use d 2 g { Y, Z), 
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Algorithm 1: Standard Grassmannian geodesic regression (Std-GGR) 


Solve 


Data: {(r.i, Y,)}"^ 1 , a and a 2 
Result: X^ro), X 2 (r 0 ) 

Set initial Xi(r 0 ) and X 2 (r 0 ), e.g., Xi(r 0 ) = Y 0 , and X 2 (r 0 ) = 0. 
while not converged do 

Solve Eqs. (18) with Xi(r 0 ) and X 2 (ro) forward for r £ [ro, rjv-i]- 
'\i = A 2 XjX 2 , A 1 (r JV _ 1 +) =0, 

A 2 = —Xi + X 2 (AjXx + X^A 2 ), A 2 (r N . l ) = 0 

Ai(rj-) = Ai(rj+) - prV Xl ( ri )C^(Xi(r i ), Y»), and Vx^c^X^), Y*) computed as -2Log Xi(r . ) (Y i ). For 

multiple measurements at a given r,, the jump conditions for each measurement are added up. 

Compute gradients with respect to initial conditions: 

V Xl( r 0 )£ = -(In-X 1 (ro)X 1 (ro) T )A 1 (ro-)+X 2 (ro)A 2 (r 0 ) T X 1 (ro), 

X7 X2(ro) E = 2aX 2 (r 0 )-(I„-X 1 (r 0 )X 1 (r 0 ) T )A 2 (r 0 ). 


backward with jump conditions 


end 


Use a line search with these gradients to update X^ro) and X 2 (r 0 ) (see supplementary material). 


with y = span(Y) and Z = span(Z). In fact, the 
(squared) geodesic distance can be computed from the 
SVD decomposition U(cosS)V T = Y T Z as d 2 (Y, Z) = 
11 cos -1 (diag S) 11 2 (cf. [ 2 ]), where S is a diagonal matrix 
with principal angles <j>i. 

Finally, consider a curve 7 : [0,1] —t G(p,n),r i-t 7 ( 7 -) 
such that 7 ( 0 ) = and 7 ( 1 ) = Oi, with Oo represented 
by Y 0 and Oi represented by Yi. The geodesic equation for 
such a curve, given that Y = d /drY(r) = (I„ — YY T )C, 
on Q (p, n) is given by 

Y(r) + Y(r)[Y(r) T Y(r)] = 0 , (16) 

which also defines the Riemannian exponential map on 
the Grassmannian as an ODE for convenient numerical 
computations. Integrating Eq. (16), starting with initial 
conditions, "shoots" the geodesic forward in time. 

5.2 Representation on the Grassmannian 

We particularly describe two types of data objects that 
can be represented as points on the Grassmannian: linear 
dynamical systems (LDS) and shapes. 

5.2.1 Linear dynamical systems 

In the computer vision literature, dynamic texture models 
[ ] are commonly applied to model videos as realiza¬ 

tions of linear dynamical systems (LDS). For a video, rep¬ 
resented by a collection of vectorized frames yi, ■ • ■, y T 
with y i £ R n , the standard dynamic texture model with 
p states has the form 

x fc+1 = Ax fc + w fc , w fc ~7V'(0,W), 

yfc = Cxi + Vfc, v fc ~ J\f(0, R) , (17) 

with x fc e RP,A e R pxp , and C e R nx P. When 
relying on the prevalent estimation approach of [40], the 
matrix C is, by design, of (full) rank p (i.e., the number 
of states) and by construction we obtain an observable 
system, where a full rank observability matrix O £ W npxp 


is defined as O = [C (CA) (CA 2 ) ••• (CA P-1 )] T . 
This system identification is not unique because systems 
(A, C) and (TAT -1 , CT -1 ) with T e QC(pf have the 
same transfer function. Hence, the realization subspace 
spanned by O is a point on the Grassmannian Q (p. n) 
and the observability matrix is a representer of this 
subspace. We identify an LDS model for a video by its 
np x p orthonormalized observability matrix. 

5.2.2 Shapes 

We consider shapes as represented by a collection of m 
landmarks. A shape matrix is constructed from its m land¬ 
marks as L = [(27, 2 / 1 ,...); (x 2 ,V 2 , •••);■•.; (x m ,2/m, ■•■)]• 
Using SVD on this matrix, i.e., L = USV T , we obtain 
an affine-invariant shape representation from the left- 
singular vectors U [15], [41]. This establishes a mapping 
from the shape matrix to a point on the Grassmannian 
(with U as the representative). Such a representation has 
been used for facial aging regression for instance [42]. 

5.3 Standard geodesic regression 

We start by adapting the generic inner-product and the 
squared geodesic distance in Eq. (10) to the Riemannian 
structure of G(p,n). Given the auxiliary states of Eq. 
(11), now denoted as matrices Xi (initial point) and X 2 
(velocity), we can write the geodesic equation of Eq. (16) 
as a system of first-order dynamics: 

Xi=X 2 , and X 2 =-Xx(X^X 2 ) . (18) 

For a point on the Grassmannian, it should further 
hold that (1) Xi(r) T Xi(r) = I p and (2) the velocity 
at Xi(r) needs to be orthogonal to that point, i.e., 
Xj (r) T X 2 (r) = 0. If we enforce these two constraints at 
the starting point r 0 , they will remain satisfied along the 

4. QC(p) is the general linear group of p x p invertible matrices. 






Algorithm 2 : Cubic-spline Grassmannian geodesic regression (CS-GGR) 

Data: {(r i; Yi)}^ 1 , {r c }f =1 , a and a 2 

Result: Xx(r 0 ), X 2 (r„), X 3 (r 0 ), X 4 (r 0 ) / {X 4 (r+)}?=i 

Set initial Xi(ro) as Y 0 for example, and X 2 (ro), X 3 (ro), X 4 (r 0 ), {X 4 (r+)}e=i as zero matrices, 
while not converged do 

Solve Eq. (22) forward in each interval with Xi(ro), X 2 (r 0 ), X 3 (ro), X 4 (ro), {X 4 (r+)}^ =1 , and 
{Xr(r+) = X 4 (r“), X 2 (r+) = X 2 (r"), X 3 (r+) = X 3 (r c -)} c c =1 . 

{ Ai = X 2 Xj X 2 - A 3 (XJXx - XjX 2 ) - X 4 (AJXx + XjA 4 ), 

A 2 = -Ax + X 2 (AJXx + XjA 2 - AjX 3 - XjA 4 ) + X 3 (Aj X, + XjA 4 ) + A 4 (-X^X 4 + XjX 3 ), 
A 3 = -A 2 - A 4 XjX 2 + X 2 (X7A 3 + AjX 2 ) + 2aX 3 , 

A 4 = A 3 -Xx(X7A 3 + AIX 2 ) 

backward with Ax(rjv-i) = A 2 (rjv-i) = A 3 (vn-i) = A 4 (rxv_x) = A 4 (r“) = 0, and 

{Ai (t-~) = Ax(r+), A 2 (r~) = A 2 (r+), A 3 (r~) = A 3 (r+)}M/ as well as jump conditions 

Me”) = Mc + ) - ^ 2 V Xl (r i )£^(Xi(r i ), Yj) ; and V Xl ( n )^(Xi(r i ) ) Y*) computed as - 2 Log Xl(r .. ) (Y i ). For 

multiple measurements at a given r lr the jump conditions for each measurement are added up. 

Compute gradients with respect to initial conditions and the fourth state at control points: 

V Xl(ro) £ = -(I„ - X 1 (r 0 )X 1 (r 0 ) T )A 1 (r 0 -) + X 2 (ro)A 2 (r 0 ) T Xi(r 0 ) + X 3 (ro)A 3 (r 0 ) T Xi(r 0 ), 
V X2(ro) £ = -(I„-Xx(ro)Xi(r 0 ) T )A 2 (ro), Vx 3(ro) £ = -(I„ - Xx(ro)Xi(ro) T )A 3 (r 0 ), 

V X4(ro) S = —A 4 (r 0 ), V X4(t ,+ } £; = -A 4 (r+), c= 1...C . 

Use a line search with these gradients to update Xx(ro), X 2 (r 0 ), X 3 (r 0 ), X 4 (ro), and {X 4 (r+)}S=i- 

end 


geodesic. Hence, the constrained minimization problem 
for standard Grassmannian geodesic regression is 

nun E(&) = a tr X 2 (r 0 ) T X 2 (r 0 ) + 

N-l 

^E^Xxf^Y-) (19) 

*=o v ’ 

subject to Xi(r 0 ) T Xi(r 0 ) = I p , 

Xx(r 0 ) T X 2 (r- 0 ) = OandEq. (18) , 

with 0 = {Xj(ro)}f =1 . As in previous sections, based on 
the adjoint method we obtain the shooting solution to 
Eq. (19), listed in Algorithm 1. We notice that the jump 
conditions on Ax involve the gradient of the residual 
term d 2 (Xi(ri),Yi) with respect to Xx(r-i), i.e., the base 
point of the residual on the fitted geodesic (see sup¬ 
plementary material, the gradient is —2Log Xl ^ r .)(Yj)). 
We refer to this problem of fitting a geodesic curve as 
standard Grassmannian geodesic regression (Std-GGR). 

5.4 Time-warped regression 

Since the concept of time-warped geodesic regression is 
general for Riemannian manifolds, specialization to the 
Grassmannian is straightforward. We only need to use 
the Std-GGR solution during the alternating optimiza¬ 
tion steps. By choosing the generalized logistic function 
of Eq. (14) to account for saturations of scalar-valued 
outputs, the time-warped model on G(p , n) can be used 
to capture saturation effects for which standard geodesic 
regression is insensible. We refer to this strategy as time- 
warped Grassmannian geodesic regression (TW-GGR). 


5.5 Cubic spline regression 

To enable cubic spline regression on the Grassmannian, 
we follow Section 4.3 and add the external force X 3 . 
In other words, we represent an acceleration-controlled 
curve Xx(r) on G(p,n ) using a dynamic system with 
states (Xx,X 2 ,X 3 ) such that 

X 2 =Xx, and X 3 = X 2 +Xi(X^X 2 ) . (20) 

Note that if X 3 = 0 , the second equation is reduced 
to the geodesic equation of Eq. (16); this indicates that 
the curve is acceleration-free. To obtain an acceleration- 
controlled curve, we need to solve the following con¬ 
strained minimization problem: 

mm E(@) = i^ trXjX,* (21) 

subject to X^Xx = Ip, X ] r X 2 = 0, and Eq. (20) 

with 0 = {X,;(ro)}f =1 . The relaxation solution to this 
problem gives us (see supplementary material) the sys¬ 
tem of equations for shooting cubic curves on G(p,n): 

Xx=X 2 , X 2 =X 3 -XxXjX 2 , 

X 3 = -X 4 + XiXx r X 4 -XiX^X 3 , (22) 

X 4 = X 3 X 2 t X 2 + X 2 XjXx - X 2 X^X 2 . 

It is important to note that X 4 does not follow a 
geodesic path under non-zero force X 3 . Hence, the 
constraints Xx(r) T Xx(r) = I p and Xx(r) T X 2 (r) = 0 
should be enforced at every instance of r to keep the 
path on the manifold. However, we can show (see sup¬ 
plementary material) that enforcing Xx(r) T X 2 (r) = 0 
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Method 

Xi(ro) 

X 2 (r 0 ) 

X 3 (r 0 ) 

X 4 s 

k 

M 

MSD (G.T. vs. Data) 

MSD (Est. vs. Data) 

MSD (G.T. vs. Est.) 

Std-GGR 

0.0207 

0.1124 

- 

- 

- 

- 

7.0e-3 

6.6e-3 

0.3e-3 

TW-GGR 

0.0206 

0.1619 

- 

- 

0.0524 

0.0056 

6.9e-3 

6.6e-3 

0.3e-3 

CS-GGR 

0.0672 

0.5389 

0.3600 

0.9687 

- 

- 

6.8e-3 

5.8e-3 

l.le-3 


TABLE 1: Regression comparison with respect to (1) the initial conditions, (2) the parameters of the time-warp function ( k,M ), 
and (3) mean square distance (MSD) among the ground truth (G.T.), estimated regression curves (Est.), and data points. For Xi: 
geodesic distance on the Grassmannian; for X 2 , X 3 , X 4 : ||Xf st - - Xf- T -||F/||X; •|| F ; for multiple X 4 s, the mean is reported. 


at all times already guarantees that Xj (r) 1 X| (r) = I p 
if this holds initially at r = 0. Also, Xi(r) T X 2 (r) = 0 
implies that X | (r ) 1 X 3 (r) = 0 . By using this fact during 
relaxation, the constraints are already implicitly captured 
in Eqs. (22). Subsequently, for shooting we only need to 
guarantee that all these constraints hold initially. 

To get a cubic spline curve, we follow Section 3.3.3 and 
introduce control points {r c }^ =1 , which divide the sup¬ 
port of the independent variable into several intervals I c . 
The first three states should be continuous at the control 
points, but the state X 4 is allowed to jump. Hence, the 
spline regression problem on G{p,n) becomes, cf. Eq.(7), 

pP N — 1 

min E(&) = a tr X^X 3 dr + 

® Jr 0 

N- 1 

i =0 

subject to Xi(r 0 ) T Xi(r 0 ) = I p , (23) 

X 1 (r 0 ) T X 2 (r 0 ) = 0, 

X 1 (r 0 ) T X 3 (r o ) = O, 

Xi,X 2 ,X 3 are continuous at {r c }£L 1; 
and Eqs. (22) holds in each X r , 

with 0 = {{X, ; (ro)}f = 1 ,{X 4 (r+)}j? =1 }. Algorithm 2 lists 
the shooting solution to Eq. (23), referred to as cubic- 
spline Grassmannian geodesic regression (CS-GGR). 

6 Experiments on toy-examples 

We first demonstrate Std-GGR, TW-GGR and CS-GGR 
on synthetic data. Each data point represents a 2D 
sine/cosine signal, sampled at 630 points in [0,107r] and 
embedded in R 24 . In particular, the 2D signal s £ R 2 xfj 3 u 
is linearly projected via s = Us, where W ~ Af(0, 1 24 ) 
and W = USV T . Finally, white Gaussian noise with 
a = 0.1 is added to s. For each signal s £ ^24x630, 
we estimate a (two-state, p = 2) LDS as discussed in 
Section 5.2.1, and use the corresponding observability 
matrix to represent it as a point on G(2,48). Besides, each 
data point has an associated scalar value; this indepen¬ 
dent variable is uniformly distributed within (0,10) and 
controls the signal frequency of the data point. For Std- 
GGR, we directly use this value as the signal frequency 
to generate 2D sine/cosine signals, while for TW-GGR 
and CS-GGR, a generalized logistic function or a sine 
function is adopted to convert the values to a signal 
frequency for data generation. It is important to note 
that the largest eigenvalue of the state-transition matrix 
A reflects the frequency of the sine/cosine signal. 


20 24 49 63 74 90 

G^b 

Fig. 4: Corpora callosa (with the subject’s age) [26]. 



17.1 [mph] 25.3 [mph] 57.7 [mph] 63.4 [mph] 


Fig. 5: Examples of the UCSD traffic dataset [43] with associ¬ 
ated speed measurements. 


To quantitatively assess the quality of the fitting re¬ 
sults, we design a "denoising" experiment. The data 
to be used for denoising is generated as follows: First, 
we use each regression method to compute a model on 
the (clean) data points we just generated. In the second 
step, we take the initial conditions of each model, shoot 
forward and record the points on the regression curve at 
fixed values of the independent variable (i.e., the signal 
frequency). These points serve as our ground truth. In 
a final step, we take each point on the ground truth 
model, generate a random tangent vector at that location 
and shoot forward along that vector for a small time 
( e.g ., 0.03). The newly generated points then serve as the 
"noisy" measurements of the original points. 

To obtain fitting results for the noisy data, we initialize 
the first state X 4 with the first data point, and all other 
initial conditions with 0 . Table 1 lists the differences 
between our estimated regression curves and the corre¬ 
sponding ground truth using two strategies: ( 1 ) compar¬ 
ison of the initial conditions as well as the parameters of 
the warping function in TW-GGR; (2) comparison of the 
full curves (sampled at the values of the independent 
variable) and the data points. The numbers indicate that 
all three models allow us to capture different types of 
relationships on the Grassmannian. 


7 Applications 

To demonstrate Std-GGR, TW-GGR and CS-GGR on 
actual vision data, we present four applications: in the 
first two applications, we regress the manifold-valued 
variable, i.e., landmark-based shapes; in the last two 
applications, we predict the independent variable based 
on the regression curve fitted to the manifold-valued 
data, i.e., LDS representations of surveillance videos. 
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Fig. 6: Example frames from the UCSD pedestrian dataset [44] 
(Pedsi). Bottom: Total people count over all frames (left), and 
average people count over a 400-frame sliding window (right). 


7.1 Datasets 

Corpus callosum shapes [26]. We use a collection of 32 
corpus callosum shapes with ages varying from 19 to 
90 years, see Fig. 4. Each shape is represented by 64 2D 
boundary landmarks, and is projected to a point on the 
Grassmannian using the representation of Section 5.2.2. 
Rat calvarium landmarks [45]. We use 18 individuals 
with 8 time points from the Vilmann rat data 5 , each in 
the age range of 7 to 150 days. Each shape is represented 
by a set of 8 landmarks. Fig. 8 (left) shows the landmarks 
projected onto the Grassmannian, using the same repre¬ 
sentation as the corpus callosum data. 

UCSD traffic dataset [43]. This dataset was introduced 
in the context of clustering traffic flow patterns with 
LDS models. It contains a collection of short traffic video 
clips, acquired by a surveillance system monitoring 
highway traffic. There are 253 videos in total and each 
video is roughly matched to the speed measurements 
from a highway-mounted speed sensor. We use the pre- 
processed video clips introduced in [ 3] which were con¬ 
verted to grayscale and spatially normalized to 48 x 48 
pixels with zero mean and unit variance. Our rationale 
for using an LDS representation for speed prediction is 
the fact that clustering and categorization experiments 
in [43] showed compelling evidence that dynamics are 
indicative of the traffic class. We argue that the notion 
of speed of an object (e.g., a car) could be considered a 
property that humans infer from its visual dynamics. 
UCSD pedestrian dataset [44]. We use the Pedsi subset 
of the UCSD pedestrian dataset which contains 4000 
frames with a ground-truth people count associated 
with each frame. Fig. 6 (bottom left) shows the total 
people count over all frames. Similar to [44] we ask the 
question whether we can infer the number of people in 
a scene (or clip) without actually detecting the people. 
While this problem has been addressed by resorting 
to crowd / motion segmentation and Gaussian process 
regression on low-level features extracted from the seg¬ 
mentation regions, we go one step further and try to 
avoid any preprocessing at all. In fact, our objective is to 
infer an average people count from an LDS representation 
of short video segments (i.e., within a temporal sliding 
window). This is plausible because the visual dynamics 

5. Online: http://life.bio.sunysb.edu/morph/data/datasets.html 


of a scene change as people appear in it. In fact, it could 
be considered as another form of "traffic". Further, an 
LDS does not only model the dynamics, but also the 
appearance of videos; both aspects are represented in the 
observability matrix of the system. We remark, though, 
that such a strategy does not allow for fine-grain frame- 
by-frame predictions as in [ 4]. Yet, it has the advantages 
of not requiring any pre-selection of features or potential 
unstable preprocessing steps such as the aforementioned 
crowd segmentation. 

In our setup, we split the 4000 frames into 37 video 
clips of 400 frames each, using a sliding window with 
steps of 100 frames, illustrated in Fig. 6 (bottom right), 
and associate an average people count with each clip. 
The video clips are spatially down-sampled to a res¬ 
olution of 60 x 40 pixel (original: 238 x 158) to keep 
the observability matrices at a reasonable size. Since the 
overlap between the clips potentially biases the experi¬ 
ments, we introduce a weighted variant of system identi¬ 
fication (see supplementary material) with weights based 
on a Gaussian function centered at the middle of the 
sliding window and a standard deviation of 100. While 
this ensures stable system identification, by still using 
400 frames, it reduces the impact of the overlapping 
frames on the parameter estimates. With this strategy, 
the average crowd count is localized to a smaller region. 

7.2 Regressing the manifold-valued variable 

The first category of applications leverages the regressed 
relationship between the independent variable, i.e., age, 
and the manifold-valued dependent variable, i.e., shapes. 
The objective is to estimate the shape for a given age. We 
demonstrate Std-GGR, TW-GGR and CS-GGR on both 
corpus callosum and rat calvarium data. Three measures 
are used to quantitatively compare the regression results: 
( 1 ) the regression energy, i.e., the data matching error over 
all observations; (2) the R 2 statistic on the Grassmannian, 
which is between 0 and 1 , with 1 indicating a perfect fit 
and 0 indicating a fit no better than the Frechet mean 
(see [46] for more details); and (3) the mean squared 
error (MSE) on the testing data, reported in a (leave-one- 
subject-out) crossvalidation (CV) setup. 

In all experiments of this paper, o in the cost function 
is set to 1 , the initial point is set to be the first data 
point, and all other initial conditions are set to zero. For 
the parameter(s) 6 of TW-GGR, we fix /3,m = 1 so that 
M is the time of the maximal growth. One control point 
is used in CS-GGR for the following two experiments, 
which is set to the mean age of the data points. 

Corpus callosum aging. Fig. 7 shows the corpus callo¬ 
sum shapes 6 along the fitted curves for the time points 
in the data. Table 2 lists the quantitative measurements. 
With Std-GGR, the corpus callosum starts to shrink 
from age 19, which is consistent with the regression 
results in [46] and [27]. However, according to biological 

6. The shapes are recovered from the points along the geodesic 
through scaling by the mean singular values of the SVD results. 
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Fig. 7: Comparison between Std-GGR, TW-GGR and CS-GGR (with one control point) on the corpus callosum data [26]. The 
shapes are generated along the fitted curves and are colored by age (best viewed in color). 



Fig. 8: Comparison between Std-GGR, TW-GGR and CS-GGR (with one control point) on the rat calvarium data [15]. The shapes 
are generated along the fitted curves and the landmarks are colored by age in days (best-viewed in color). 



Corpus callosum [ 6] 

Rat calvarium [45] 


Std-GGR 

TW-GGR 

CS-GGR (1) 

CS-GGR (2) 

Std-GGR 

TW-GGR 

CS-GGR (1) 

CS-GGR (2) 

Energy 

0.35 

0.34 

0.32 

0.31 

0.32 

0.18 

0.16 

0.16 

R 2 

0.12 

0.15 

0.21 

0.23 

0.61 

0.79 

0.81 

0.81 

MSE 

1.25e-2 

1.22e-2 

1.36e-2 

1.43e-2 

2.3e-3 

1.3e-3 

1.2e-3 

1.2e-3 


TABLE 2: Comparison of Std-GGR, TW-GGR and CS-GGR with one (1) and two (2) control points on the corpus callosum and rat 
calvarium datasets. For Energy and MSE smaller values are better, for R 2 larger values are better. 


studies [47], [48], the corpus callosum size remains stable 
during the most active years of the lifespan, which is 
consistent with our TW-GGR result. As we can see from 
the optimized logistic function in Fig. 9 (left), TW-GGR 
estimates that thinning starts at « 50 years, and at the 
age of 65, the shrinking rate reaches its peak. From 
the CS-GGR results, we first observe that the R 2 value 
increases notably to 0.21/0.23, compared to 0.12 for Std- 
GGR. While this suggests a better fit to the data, it is 
not a fair comparison, since the number of parameters 
for CS-GGR increases as well and a higher R 2 value 
is expected. Secondly, the more interesting observation 
is that, qualitatively, we observe higher-order shape 
changes in the anterior and posterior regions of the 
corpus callosum, shown in the zoomed-in regions of Fig. 
7; this is similar to what is reported in [27] for polyno¬ 
mial regression in 2D Kendall shape space. Flowever, 
our shape representation on G(p,n), by design, easily 
extends to point configurations in R :i . This is in contrast 
to 3D Kendall shape space which has a substantially 
more complex structure than its 2D variant [49]. 


1.0 

0.8 

0.6 

0.4 

0.2 

0 

0 25 45 65 75 100 0 30 50 100 150 

Fig. 9 : Estimated time-warp functions for TW-GGR. 


Corpus callosa Rat calvarium 




Rat calvarium growth. Fig. 8 (leftmost) shows the pro¬ 
jection of the original data on G(2,8), as well as data 
samples generated along the fitted curves. Table 2 lists 
the performance measures. From the zoomed-in regions 
in Fig. 8, we observe that the rat calvarium grows at an 
approximately constant speed during the first 150 days if 
the relationship is modeled by Std-GGR. Flowever, the 
estimated logistic curve for TW-GGR, shown in Fig. 9 
(right), indicates that the rat calvarium only grows fast 
in the first few weeks, reaching its peak at 30 days; 
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Fig. 10: Traffic speed predictions via 5-fold CV using Std-GGR 
(left) and its piecewise variant (right). The red solid curve shows 
the ground truth (best-viewed in color). 


then, the rate of growth gradually levels off and becomes 
steady after around 14 weeks. In fact, similar growth 
curves for the rat skull were reported in [50]. Based 
on their study, the growth velocities of viscerocranium 
length and nurocranium width rose to the peak in the 
26 — 32 days period. Comparing the R 2 values for TW- 
GGR and CS-GGR, we see an interesting effect: although, 
we have more parameters in CS-GGR, the R 2 score 
only marginally improves. This indicates that TW-GGR 
already sufficiently captures the relationship between 
age and shape. It further confirms, to a large extent, 
the hypothesis put forward in [27], where the authors 
noted that the cubic polynomial in 2D Kendall shape 
space essentially degrades to a geodesic under polyno¬ 
mial time reparametrization. Since TW-GGR, by design, 
reparametrizes time (not via a cubic polynomial, but via 
a logistic function), it is not surprising that this relatively 
simple model exhibits similar performance to the more 
complex CS-GGR model. 

Comparison to a Jacobi field approach. We compare the 
performance of our Std-GGR approach to a comparable 
approach using Jacobi fields [25]. Both methods are 
applied on the corpus callosum dataset. To quantitatively 
measure the differences between the regression results, 
we first compute the distance between the estimated ini¬ 
tial conditions. The geodesic distance between two initial 
points (Xi) is 9e-4, and the Frobenius norm between 
the two initial velocities (X 2 ) is 4e-3. Secondly, we use 
the initial conditions to shoot forward and calculate the 
geodesic distance between corresponding points on the 
two estimated geodesics. The mean geodesic distance is 
4e-4, which indicates that both methods provide similar 
solutions. Although they have comparable performance 
for standard regression, our Std-GGR has the advantage 
of being easily extensible to higher-order models, e.g., 
CS-GGR. 

7.3 Predicting the independent variable 

The second category of applications aims to predict the 
independent variable using its regressed relationship 
with the manifold-valued dependent variable. Specifi¬ 
cally, given a point on the Grassmannian, e.g., an LDS 
representation of a video clip, we search along the regressed 
curve (with a step size of 0.05 in our experiments) to 
find its closest point, and then take the corresponding 


Std-GGR CS-GGR 



Fig. 11 : Crowd counting results via 4-fold CV. Predictions are 
shown as a function of the sliding window index. The gray enve¬ 
lope indicates the weighted standard deviation (±1 <j) around the 
average crowd size in a sliding window (best-viewed in color). 

independent variable of this closest point as its predicted 
value. This could be considered a variant of nearest- 
neighbor regression where the search space is restricted 
to the sampled curve on the Grassmannian. The case 
when the search space is not restricted, but contains all 
points, will be referred to as our baseline. Note that in 
our strategy, search complexity is controlled via the step- 
size, while the search complexity for the baseline scales 
linearly (for each prediction) with the sample size. 

Furthermore, we remark that in this category of ap¬ 
plications, TW-GGR is not appropriate for predicting the 
independent variable for the following reasons: First, in 
case of the traffic speed measurement, the generalized 
logistic function tends to degenerate to almost a step- 
function, due to the limited number of measurement 
points in the central regions. In other words, two greatly 
different independent variables would correspond to 
two very close data points, even the same one, which 
would result in a large prediction error. Second, in 
case of crowd-counting, there is absolutely no prior 
knowledge about any saturation or growth effect which 
could be modeled via a logistic function. Consequently, 
we only demonstrate Std-GGR and CS-GGR on the two 
datasets. Note that prediction based on nearest neighbors 
could be problematic in case of CS-GGR, since the model 
does not guarantee a monotonic curve. We report the 
mean regression energy and the mean absolute error 
(MAE), computed over all folds in a crossvalidation 
setup with a dataset-dependent number of folds. 

Speed prediction. For each video clip, we estimate LDS 
models with p = 10 states. The control point of CS-GGR 
and the breakpoint for piecewise Std-GGR is set at 50 
[mph]. Results are reported for 5-fold CV, see Fig. 10. 
The quantitative comparison to the baseline in Table 3 
shows that piecewise Std-GGR has the best performance. 
Crowd counting. For each of the 37 video clips we 
extract from the Pedsl dataset, we estimate LDS models 
with p = 10 states using weighted system identification. 
For CS-GGR, the control point is set to a count of 23 
people which separates the 37 videos into two groups 
of roughly equal size. Quantitative results for 4-fold CV 
are reported in Table 3, Fig. 11 shows the predictions 
~os. the ground truth. As we see, both Std-GGR and CS- 
GGR output predictions "close" to the ground truth. 
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Traffic speed 

People counting 


Baseline 

Std-GGR 

Std-GGR (PW) 

CS-GGR 

Baseline 

Std-GGR 

Std-GGR (PW) 

CS-GGR 

Mean energy 

- 

2554.88 

2461.95 

2670.84 

- 

273.81 

224.87 

244.02 

Train-MAE 

- 

2.98 ± 0.33 

1.48 ± 0.07 

2.42 ±0.35 

- 

0.97 ± 0.07 

0.59 ± 0.13 

0.63 ±0.19 

Test-MAE 

4.14 ± 0.36 

4.44 ± 0.16 

3.46 ± 0.64 

6.32 ± 1.62 

2.40 ±0.53 

1.88 ± 0.75 

2.14 ±1.03 

2.11 ± 0.76 


TABLE 3: Mean energy and mean absolute errors (MAE) over all CV-folds ±1 <x on training and testing data. 


mostly within lex (shaded region) of the average crowd 
count. However, a closer look at Table 3 reveals a typical 
overfitting effect for CS-GGR: while the training MAE is 
quite low, the testing MAE is higher than for the simpler 
Std-GGR approach. While both models exhibit compara¬ 
ble performance (considering the standard deviation of 
« 0.7), Std-GGR is preferable, due to fewer parameters 
and its guaranteed monotonic regression curve. 

8 Discussion 

In this paper, we developed a general theory for para¬ 
metric regression on the Grassmann manifold from an 
optimal-control perspective. By introducing the basic 
principles for fitting models of increasing order for the 
special case of M. = K", we established the framework 
that was then used for a generalization to Riemannian 
and, in particular, the Grassmann manifold. We demon¬ 
strated that our solution to the parametric regression 
problem is simple, extensible, and easy to implement. 

From an application point of view, we have seen that 
quite different vision problems can be solved within 
the same framework under minimal data preprocessing. 
While the presented applications are limited to shape 
analysis and surveillance video processing, our method 
should, in principle, be widely applicable to other prob¬ 
lems on the Grassmann manifold, e.g., domain adap¬ 
tation, facial pose regression, or the recently proposed 
domain evolution problems. 

Regarding the limitations of the proposed approach, 
we note that the issue of model selection is critical. In 
fact, whether we should use Std-GGR, TW-GGR or CS- 
GGR highly depends on our prior knowledge of the data. 
In shape regression, for instance such prior knowledge 
is frequently available, since the medical / biological 
literature already provides evidence for different growth 
and saturation effects as a function of age. For appli¬ 
cations where prediction of the independent variable 
is of importance, e.g., traffic or or crowd surveillance, 
we additionally have computational constraints in many 
cases. Interestingly, a simple geodesic curve as a model 
for regression can often provide sufficiently good per¬ 
formance, as we observed in the crowd counting experi¬ 
ment. We hypothesize that this can be explained, to some 
extent, by the fact that geodesic regression respects the 
geometry of the underlying space. It is possible that in 
this space, the relationship between the dependent and 
the independent variable might actually be relatively 
simple to model. In contrast, approaches where video 
content is boiled down to feature vectors and conven¬ 
tional regression approaches with standard kernels are 


used, more flexible models might be needed. TW-GGR 
can serve as a hybrid solution when we have prior 
knowledge about the data; however, samples throughout 
the range of the independent variable are needed to 
avoid degenerate cases of the warping function. While 
this could be avoided via regularization, we did not 
explore this direction. 
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